A data-driven geospatial workflow to map species distributions for conservation assessments
Abstract
Aim
Species distribution maps are essential for assessing extinction risk and guiding conservation efforts. However, most come sourced as expert-drawn range maps with known issues of accuracy or are developed with overly complex modelling procedures. Thus, data-driven alternatives that are accessible and reliable are a welcome addition to the spatial conservation toolkit. Here, we developed a geospatial workflow to refine the distribution of a species from its extent of occurrence (EOO) to area of habitat (AOH) within the species range map. The range maps are produced with an inverse distance weighted (IDW) interpolation procedure using presence and absence points derived from primary biodiversity data.
Location
The Americas (North, South, Central America and the Caribbean).
Methods
As a case study, we mapped the distribution of 723 resident forest birds in the Americas and assessed their performance in comparison with expert-drawn range maps. We evaluated differences in accuracy, spatial overlap, range map size and derived AOH.
Results
The geospatial workflow generated IDW range maps with a higher overall accuracy (87% versus 62%) and fewer errors of omission (<1%) and commission (14%) than the expert range maps (28% both errors). The spatial overlap between both datasets was low (35%), but the agreement increased in areas of high probability of occurrence (68%). We did not find significant differences in range size, but the AOH derived from the expert-drawn range maps was consistently smaller than the estimates from the IDW range maps.
Main Conclusions
Our geospatial workflow provides a straightforward approach to accurately map species ranges and the estimation of area of habitat (AOH) for conservation planning and decision-making. Conversely, procedures that refine expert-drawn range maps to obtain AOH risk producing biased estimates for local-scale applications.
1 INTRODUCTION
Mapping species distributions is crucial to support conservation actions in the current extinction crisis (Bachman et al., 2019). The International Union for Conservation of Nature (IUCN) Red List of Threatened Species employs two spatial metrics that represent species distributions and support extinction risk assessments (IUCN, 2012): extent of occurrence (EOO) and area of occupancy (AOO). These metrics represent the upper and lower bounds of a species distribution and have a different theoretical basis—the former measures the degree of risk spread and the latter is closely linked to population size (Gaston & Fuller, 2009). Nevertheless, the IUCN has recommended that for planning conservation actions, other metrics might be more appropriate (IUCN/SPC, 2019; IUCN/SSC, 2018).
Recently, Brooks et al. (2019) suggested that IUCN Red List assessments should incorporate area of habitat (AOH), defined as the area of potentially suitable ecological conditions for a species within its range, also known as extent of suitable habitat (KBA/SAC, 2019). AOH is relevant to guide conservation by quantifying habitat loss and fragmentation within a species range (Brooks et al., 2019) and is already part of the criteria for identifying Key Biodiversity Areas (KBA) (IUCN, 2016).
To map AOH, researchers have relied mostly on refining expert-drawn range maps by clipping unsuitable areas based on published elevational limits and known habitat preferences (Harris & Pimm, 2008; Ocampo-Peñuela et al., 2016). This approach has been useful to inform the conservation status of different terrestrial organisms on a global (Ficetola et al., 2015; Rondinini et al., 2011; Tracewski et al., 2016) and local scale (Negret et al., 2021). However, expert-drawn range maps can have low accuracy due to errors of omission (known presences outside of the mapped area) and commission (known absences inside the mapped area) (Beresford et al., 2011; Mainali et al., 2020; Peterson et al., 2018). This is problematic at local scales where it is necessary to avoid mischaracterization of species distributional patterns for conservation applications (Ficetola et al., 2014; Mainali et al., 2020; Rahbek, 2005).
Because the AOH is a subset of the range (Brooks et al., 2019), the critical step to reliably map species AOH is to obtain accurate species range maps, which depict “the current known limits of distribution of a species, accounting for all known, inferred or projected sites of occurrence” (IUCN/SSC, 2019; KBA/SAC, 2019). There are different methods to produce species range maps, and it is crucial to understand the limitations and trade-offs of the different alternatives (Cantú-Salazar & Gaston, 2013; Graham & Hijmans, 2006; Maréchaux et al., 2017).
A common approach is to use species distribution models (SDM) that correlate environmental variables with known occurrences (Elith & Leathwick, 2009). These methods have been successfully applied to map species ranges and inform Red List assessments (Breiner et al., 2017; Pena et al., 2014; Syfert et al., 2014). However, SDMs are challenging to implement for multiple species, with modelling decisions that may be idiosyncratic and, if not well understood or explicitly communicated, can cause reduced transparency and reproducibility (Feng et al., 2019; Guisan et al., 2013; Sofaer et al., 2019). Additionally, methodological shortcomings could cause overprediction of species distributions and optimistic assessments of extinction risk (Velazco et al., 2020). Thus, there is a need for developing more straightforward approaches to produce accurate range maps especially for hundreds (or more) species (García-Roselló et al., 2019).
Another approach to map species distributions for conservation applications is spatial interpolation (Bahn & McGill, 2007; Li & Heap, 2008). Of the different types of interpolation available such as kriging, spline, or inverse distance weighting (IDW), the latter one is a deterministic method that is accessible, fast, user-friendly and could be readily employed to develop species range maps. IDW uses known presences and absences to produce a surface of probability values for the occurrence of a species within an area (Hijmans & Elith, 2019). The main assumption is that species are more likely to be found closer to presence points and farther from absences (i.e. spatial autocorrelation), and the local influence of points (weighted average) diminishes with distance. The ranges produced with IDW are consistent with the observation that distributions are naturally porous and have discontinuities (holes) at higher resolutions (Hurlbert & Jetz, 2007).
Inverse distance weighted has been used for mapping invasive plants (Roberts et al., 2004) and to understand the distribution of coral reef sessile organisms with better performance than ordinary kriging (Zarco-Perello & Simões, 2017). Moreover, distribution maps generated with this method have been found to largely overlap with SDMs derived from Maxent for hyper-dominant Amazonian tree species (Gomes et al., 2018). This overlap occurs when the environmental variables used for correlative modelling do not provide more explanatory power than the spatial structure of the point data (Bahn & McGill, 2007; Hijmans, 2012; Journé et al., 2020). Similarly to SDMs, interpolation with IDW benefits from larger sample sizes and unbiased spatial data, so procedures that reduce spatial bias and class imbalance between presence and absence points are recommended (Steen et al., 2021).
Here, we developed a geospatial workflow to map species distributions with presence and absence points derived from primary biodiversity data. To evaluate our workflow, we mapped the distribution of 723 forest birds in the Americas and assessed its performance in comparison with the range maps of Birdlife International. Our intent was to study differences and trade-offs to aid conservationists take informed decisions when using mapping alternatives (O’Hara et al., 2017). Overall, by analysing differences in accuracy, spatial overlap, range size, and AOH estimates, we present an alternative approach that is suitable to map species distributions at local scales for conservation planning and decision-making.
2 METHODS
2.1 Geospatial workflow
We developed a geospatial workflow that uses presence and absence data to refine the distribution of a species from its extent of occurrence (EOO) to a final area of habitat (AOH) map within the species range. Users should first gather occurrence data, and we suggest following recent guidelines (Araújo et al., 2019; Feng et al., 2019). Absences can be derived from known surveys, monitoring schemes, or citizen science projects that provide a measure of sampling effort (Johnston et al., 2021).
- Draw the extent of occurrence (EOO) with a minimum convex polygon (MCP) around presence points. The MCP is recommended to represent the EOO over other alternatives (e.g. alpha hulls) because it can be applied consistently across taxa and produces conservative estimates (Joppa et al., 2016). We recommend following strict IUCN guidelines and avoid buffering the MCP as it risks overestimation of the EOO (IUCN/SSC, 2019).
- Clip the EOO to the elevational limits of a species by extracting the elevation of each occurrence point using a Digital Elevation Model (DEM). Elevational limits from point data provide better estimates than gathering from the literature (Rotenberry & Balasubramaniam, 2020) but it is advisable to filter for elevational outliers beforehand (see case study workflow). The choice of DEM resolution depends on the spatial extent of the data and the intended analysis; 90-m and 250-m resolutions are commonly employed (Amatulli et al., 2018)
- Overlay presence and absence points to the clipped EOO. These are the data points that will be used for generating the range maps.
- Interpolate presence and absence points using inverse distance weighting (IDW) within the clipped EOO. The output is a raster surface within the clipped EOO where each cell has a value between 0 and 1. The species range is obtained by using a threshold to convert probability of occurrence to a binary map where the species is present or absent (Liu et al., 2005). Choosing a threshold depends on the map's intended purpose and a default cut-off of 0.5 is not recommended (Freeman & Moisen, 2008; Jiménez-Valverde & Lobo, 2007).
- Clip a habitat layer inside the IDW range map to remove unsuitable areas and derive area of habitat (AOH) (KBA/SAC, 2019). We suggest following the IUCN Habitat Classification Scheme for consistency. Users could use empirically developed products [e.g. Global Habitat Type Map—Jung et al. (2020)] which spatially map IUCN habitat types or use expert criteria to select land cover classes from other classifications that match the definition of species habitats coded by the IUCN.

2.2 Case study
We implemented our geospatial workflow using a dataset of 723 resident forest landbirds in the Americas (North, South, Central America, and the Caribbean)—20% of all forest birds in the region according to the BirdLife Datazone (BirdLife International, 2021).
Our species selection focused on forest birds with smaller ranges (<1 M km2; [Jenkins et al., 2013; Orme et al., 2005]) because they are typically associated with increased risk of extinction (Chichorro et al., 2018; Lucas et al., 2019). On the other end, we excluded species classified as Extinct (EX), Extinct in the Wild (EW), and possibly extinct (CR- PE) based on the 2021 IUCN Red List. We also excluded species endemic to the Galapagos, Cocos, and Hawaiian Islands. Furthermore, after data processing (next two sections), we removed species with less than 15 presences and absences (Papeş & Gaubert, 2007; Proosdij et al., 2016).
2.2.1 Gathering and processing occurrence data
We gathered occurrence data from the Global Biodiversity Information Facility (GBIF; https://www.gbif.org). GBIF is an aggregator of multiple occurrence datasets spanning from museum specimen records to the eBird and iNaturalist citizen science projects, and it is extensively used for mappings species distributions (Heberling et al., 2021). We generated a GBIF occurrence data download (https://doi.org/10.15468/dl.rrrrf3) on 01 July 2021 with the R package rgbif (Chamberlain & Boettiger, 2017) for an initial list of 1,696 species that fulfilled our species selection criteria.
The GBIF download was restricted to occurrences starting in the year 2000 to derive conservative estimates of 21st-century distributions. Filters were set for data with no geospatial issues and basis of record (i.e. type of evidence) as “observation,” “human observation,” “literature,” “preserved specimen” or “material sample.” The occurrence data from GBIF follow a stable taxonomic backbone that is not aligned with the taxonomy used in BirdLife (Burfield et al., 2017). Hence, we needed to confirm that taxonomic names in both datasets actually relate to the same species concept (McClure et al., 2020). Thus, we only included species where the GBIF occurrence points and the BirdLife distribution maps broadly covered the same geographical regions based on a visual assessment.
We employed a two-stage cleaning protocol to improve the quality of our study dataset (Panter et al., 2020). First, we removed duplicates and used the “clean_coordinates” function of the R package CoordinateCleaner (Zizka et al., 2019) with default settings to automatically filter common erroneous coordinates, such as those assigned to the sea, country capitals, or biodiversity institutions (Maldonado et al., 2015). Second, for each species we examined the spatial arrangement of the occurrence points and removed obvious geographical outliers using the interactive modular R platform Wallace (Kass et al., 2018). To reduce spatial bias in the occurrence data, we kept records at least 5 km apart by subsampling within equal-area hexagon cells (Johnston et al., 2021; Strimas-Mackey et al., 2020).
2.2.2 Deriving absences
We developed an approach to identify absences based on eBird hotspots—publicly accessible locations that can aggregate observations from multiple independent checklists over time (Sullivan et al., 2009). Here, we deemed absences as eBird hotspots where a given species has never been recorded. We used this approach as a stringent criterion to minimize the uncertainties regarding absences (Lobo et al., 2010).
We extracted hotspots from the eBird Basic Dataset (eBird Basic Dataset, 2020) for 57 countries and territories in the Americas with the R package auk (Strimas-Mackey et al., 2016). We kept hotspots that had at least 10 complete eBird checklists (La Sorte & Somveille, 2020) that had a “stationary” or “traveling” survey protocol, with a duration <5 hr, distance <5 km and with up to 10 observers (Johnston et al., 2021). We obtained 73,634-point locations of hotspots which were used to identify absences on a species-by-species basis.
2.2.3 Case study workflow
- We drew each species EOO around the occurrence data as the minimum convex polygon (MCP).
- We clipped the EOO to each species' elevational limits by extracting elevational values from the occurrence points. The elevational data were gathered using the srtm 4.1 DEM at 250-m resolution (Jarvis et al., 2008). To obtain conservative estimates, we removed the extreme low and high values outside the 95% elevational range limit (Neate-Clegg et al., 2021).
- We overlaid presence and absence points to the clipped EOO. Then, we removed absences within a 5-km buffer of any presence points. This accounted for the geoprecision of the eBird hotspots from which we derived the absences (Johnston et al., 2021; Strimas-Mackey et al., 2020).
- We interpolated presence and absence points using inverse distance weighting (IDW) in the R package dismo (Hijmans et al., 2017). We transformed the continuous IDW surface into a species range map (presence/absence) following a fivefold cross-validation procedure in which each fold is used for training and testing at least once, generating five different model partitions (required species with at least 15 presences and absences). We then averaged the threshold that maximizes overall accuracy as a binary cut-off—see accuracy assessment; section 2.3.2.
- To derive AOH, we clipped a forest layer inside each species range map. We used a recently developed 50-m global forest/non-forest layer from the TanDEM-X satellite (Martone et al., 2018). This product is designed to specifically estimate forest cover instead of other approaches that are more focused on evaluating forest cover change (e.g. Global Forest Watch). We aggregated to 250 m with a nearest neighbour assignment to match the resolution of our elevational data.
2.2.4 Expert-drawn range maps
We gathered expert-drawn range maps from BirdLife International and HBW (2020). Each bird species is represented by polygons coded with different attributes that describe a species distribution. For each forest species in our list, we extracted polygons coded as presence = Extant; Origin = Native; Season = Resident. These settings excluded parts of the range where a species might be introduced, or known to be extirpated. We then dissolved these polygons to represent each species range into a single spatial feature (Cantú-Salazar & Gaston, 2013).
To estimate area of habitat (AOH) from the BirdLife range maps, we refined the polygon of each species using published elevational limits and forest cover (Ocampo-Peñuela et al., 2016). Species elevational limits were obtained from the IUCN Red List and if missing supplemented with information from Cornell's Birds of the World (https://www.birdsoftheworld.org). We derived the AOH map for the BirdLife ranges using the same procedure as in the case study workflow (step 5; previous section).
2.3 Analysis
2.3.1 Accuracy analysis
We evaluated the performance of the IDW and expert-drawn range maps using two different facets of accuracy: reliability and discriminant capacity (Leroy et al., 2018). Reliability measures if the areas depicted by a range map agree (or match) with areas of high probability of occurrence (Pearce & Ferrier, 2000) whereas the discriminant capacity is the ability to distinguish between presences and absences (Liu et al., 2011).
To evaluate discriminant capacity, we created binary IDW range maps (section 2.2.3) using 70% of the presences and absences, leaving the remaining 30% for testing. The expert range maps were evaluated against the same set of independent testing data. We used three basic metrics of discrimination performance (Figure 2 for reference): omission errors (false negatives), commission errors (false positives), and overall accuracy (Anderson et al., 2003). The overall accuracy was calculated using Jaccard's similarity index, which summarizes the ability of the mapped range to (a) maximize true presences, (b) reduce both false negatives and false positives and (c) disregard true negatives that are easily classified because of prevalence (Leroy et al., 2018; Li & Guo, 2013).

A caveat to the accuracy assessment based on data points is that it is limited by the availability of public data and it cannot account for the performance of a range map in undersampled areas. This means that the expert-drawn range maps could exhibit a higher accuracy than what the evaluation of discriminant capacity suggests, provided their predictions correctly point to areas where a species is actually present or absent. Thus, to alleviate this limitation we assessed the reliability between the expert-drawn range map predictions and the IDW continuous probability surface. For this, we used the expert score of Mainali et al. (2020). Higher scores indicate that the expert range maps predict presences in areas of high probability of occurrence (even if sampling is scarce or not available) while avoiding areas of low probability (see Mainali et al., 2020 for details).
2.3.2 Statistical analysis
We calculated average values of the accuracy metrics using 20% trimmed means as a robust measure of centrality and estimated 95% confidence intervals using a percentile bootstrap procedure (Wilcox, 2017). We further examined how the overall accuracy of the expert-drawn range maps influenced spatial overlap with the IDW range maps using a robust linear regression. For this, we first computed the spatial overlap between both sets of binary range maps as a ratio of the area of intersection to the area of union (Mainali et al., 2020).
We tested for a statistically significant difference in range size between both datasets with a two-sample Kolmogorov–Smirnov test (K–S test). We also evaluated differences across the entire distribution of range size values. We used a shift function with the rogme package in R (Rousselet et al., 2017) that allowed us to quantify differences between deciles of the expert-drawn and geoworkflow range maps (Figure 3). The same analyses were conducted for evaluating differences in AOH. We then computed a robust log-linear regression with an MM-type estimator in the R package robustbase (Maechler et al., 2019) to examine how omission and commission errors influence range size in the expert-drawn range maps.

3 RESULTS
3.1 Accuracy and range map overlap
The 20% trimmed mean of omission errors for the expert-drawn range maps was 0.28 (95% percentile bootstrap CI = 0.26–0.3; p < .001), and for the IDW range maps, it was substantially lower (0.06; 95% CI = 0.05–0.06; p < .001). Commission errors were also lower in the IDW range maps (0.14; 95% CI = 0.13–0.15, p < .001) than for the expert-drawn range maps (0.28; 95% CI = 0.25–0.3; p < .001). The overall accuracy of the IDW range maps was high (0.87; 95% CI = 0.87–0.88, p < .001) whereas for the expert-drawn range maps was only moderate (0.62; 95% CI = 0.6–0.63; p < .001). The results are visualized in Figure 3, and the accuracy metrics for each of the species analysed are available in Appendix S1.
The spatial overlap between both datasets was only 35% (95% CI = 33%–36%; p < .001). A robust regression showed that for the expert-drawn range maps, the overlap increases with increasing overall accuracy (coef = 0.49; SE = 0.03; t = 18.74; p < .001). Notwithstanding, we found the reliability of the expert-drawn range maps was fairly high, with their delineated areas matching areas of high probability of occurrence in the IDW surface (expert score 68%; 95% CI = 0.66–0.70; p < .001).
3.2 Range size and AOH estimates
For both sets of range maps (n = 723 each), range sizes spanned from 1,667 km2 to 4,336,938 km2. We did not find a significant difference in the distributions of range sizes (Figure 4a; K–S test = 0.05, p = .365) but there was a different distribution of AOH size (Figure 4b; K–S test = 0.20, p < .001). The robust linear regression showed that for the expert range maps, omission errors had a greater influence on decreasing range size (coef = −0.82, SE = 0.12, t = −6.8, <0.001) than the effect of commission errors on increasing it (coef = 0.5, SE = 0.08, t = 6.39, <0.001).

A shift function showed the differences between the deciles of the expert and IDW range maps did not differ significantly (Figure 4c) but we found the expert-derived AOH maps are consistently smaller across the entire distribution of values (Figure 4d). Thus, the typical AOH derived from the expert range maps was substantially smaller than the AOH obtained from the IDW range (expert range map tmean = 98,838 km2; IDW range map tmean = 58,019 km2) with an estimated mean difference of 40,819 km2 (95% CI: 30,503–51,163 km2, p < .001). On average, the AOH from our workflow represented 63% of the range, whereas for the AOH derived from the expert maps was 33%.
4 DISCUSSION
We developed a reproducible geospatial workflow to map species distributions starting from primary biodiversity data that we tested for 723 resident forest birds in the Americas. We used inverse distance weighting (IDW) interpolation with presence data from GBIF and obtained absence records from eBird hotspots. Our results showed that on average, the geospatial range maps had an improved accuracy over expert range maps, which have previously been shown to exhibit substantial errors of omission and commission (Hurlbert & Jetz, 2007; Jetz et al., 2008; Mainali et al., 2020; Peterson et al., 2018). In consequence, the derived AOH maps from our workflow can be used more reliably for activities such as establishing protected areas, designing habitat corridors, or identifying critical areas for conservation actions.
Our approach is useful for local-scale applications but is not intended for every scenario—there are no “silver bullets” to represent species distributions for every purpose (Guillera-Arroita et al., 2015; Qiao et al., 2015). The generated maps stay close to the known data, which is useful for systematic conservation planning (Margules & Pressey, 2000) but not for exploration of areas likely to contain a species. Furthermore, our range maps are not suitable for macroecological studies (e.g. species richness or endemism) where coarser spatial resolutions are recommended (Hurlbert & Jetz, 2007). The expert maps are more appropriate in this situation because their delineated areas characterize the main core of species distributions at a broad scale, based on measuring their agreement with areas of high probability of occurrence in the IDW continuous surface of our workflow (Mainali et al., 2020).
On the other hand, the derived estimates of AOH from the expert-drawn maps were consistently smaller than our estimated AOH despite being generated from similar range sizes (Figure 4). We attribute this result to a drastic reduction in area that occurs when refining the expert maps twice: first by published elevational limits and then by forest cover (Peterson, 2017). Moreover, this approach risks mischaracterization of species distributional patterns at local scales and critical populations might be left out of the mapped AOH (Peterson et al., 2018; Figure 5). We therefore do not recommend refining expert maps without a previous assessment of their accuracy. In turn, the maps of AOH derived from the geoworkflow range maps can be more readily used for conservation planning and decision-making.

The geospatial workflow emphasized confirmed records and excluded areas of known absence which increased the spatial accuracy of predictions (Elith et al., 2006; Mainali et al., 2020). Besides the more reliable estimates from using primary biodiversity data (Peterson et al., 2018; Rotenberry & Balasubramaniam, 2020), our approach has the advantage that it can be adjusted based on user needs, and experts can review the produced range maps for quality checks to further improve their accuracy (Graham & Hijmans, 2006; Merow et al., 2017; Velásquez-Tibatá et al., 2019). This can facilitate the integration of mapping species ranges with extinction risk assessments for the IUCN Red List, especially since our protocol starts with defining species extent of occurrence (EOO), which serves both as a metric under criterion B of the Red List and as the limits of the mapping area for a given species.
In the urgency to evaluate the conservation status of as many species as possible and implementing measures to halt biodiversity loss, our geospatial workflow is a valuable addition to the toolkit of conservation practitioners and decision-makers as it is consistent and easily applicable. Notwithstanding, we acknowledge the scarcity of spatial data is a limiting factor and our approach might not adequately map ranges for severely under sampled groups, or those from which it is not yet possible to derive absence data. Only with more field explorations and carefully designed biological surveys, such as standardized camera trapping surveys, will we overcome the scarce knowledge of species distributions around the globe for effective on-the-ground conservation actions (Wilson, 2017).
ACKNOWLEDGEMENTS
We thank S.L. Pimm and R. Huang for suggestions during the initial development of the geospatial workflow. We thank N. Hazzi and M. Di Marco for useful comments and edits.
CONFLICT OF INTEREST
The authors declare that there is no conflict of interest.
REFERENCES
BIOSKETCH
Ruben Dario Palacio is the founder and science director of Fundacion Ecotonos (www.ecotonos.org), a Colombian non-profit dedicated to the conservation of biodiversity in the tropical Andes. He works in the fields of ornithology, restoration ecology and conservation biology.
Author contributions: R.D.P. conceived and developed the geospatial workflow, conducted the analysis and drafted the manuscript. P.J.N., J.V-T and A.P.J. provided critical input to the analysis and contributed with extensive edits. All authors gave final approval for publication.